Large Object Oriented Knowledge Bases: A Challenge for Reasoning Engines
نویسندگان
چکیده
We announce the availability of KB Bio 101 for research purposes. We explain the origins of this KB and identify the research problems it poses for the state-ofthe-art answer set solvers, first order theorem provers and description logic reasoners. The Knowledge Base KB Bio 101 The goal of Project Halo1 is to develop a “Digital Aristotle” — a reasoning system capable of answering novel questions and solving advanced problems in a broad range of scientific disciplines and related human affairs. As part of this effort, SRI has created a knowledge base called KB Bio 101that represents knowledge from a textbook used for advanced high school and introductory college biology courses. The KB Bio 101 contains a concept taxonomy for the whole textbook and detailed rules for 20 chapters of the textbook. SRI has tested the educational usefulness of this knowledge base in the context of an electronic version of the book as it is used by students studying from that book2. The KB Bio 101 was originally developed using a knowledge representation and reasoning system called Knowledge Machine (KM) (Clark and Porter 2011). KM supports a variety of representation features that include a facility to define classes and organize them into a hierarchy and define partitions, ability to define relations (also known as slots) and organize them into a relation hierarchy, support for nominals, a facility to define Horn rules, a procedure language, a situation mechanism, and a STRIPS representation for actions. KM performs reasoning by using inheritance, description-logic style classification of individuals, backward chaining over rules, and a heuristic unification. KM supports para-consistent reasoning3 in the sense that it can perform reasoning even in the face of inconsistencies in the KB. In addition, KM can use its situation mechanism and STRIPS representation of actions to simulate their execution. While the project team has experimented with the use of all of these features, the KB Bio 101 does not leverage http://www.projecthalo.com/ http://www.aaaivideos.org/2012/inquire_ intelligent_textbook/ http://plato.stanford.edu/entries/ logic-paraconsistent/ the STRIPS features of KM. We have just completed work to export the KB Bio 101 in a variety of standard declarative languages, for example, first order logic with equality (Fitting 1996), SILK (Grosof 2009), description logics (Baader et al. 2007) and logic programming under the answer set semantics (Gelfond and Lifschitz 1990). The KB Bio 101 is encoded as an object oriented knowledge base (Chaudhri et al. 2013a). The current KB has more than 6000 classes, 6500 subclass and disjointness relationships in the class hierarchy, and several hunderd thousands rules (axioms). The KB Bio 101 is now freely available for research purposes4. Reasoning Problems in KB Bio 101 The basic reasoning problems in KB Bio 101 are grouped into different types as follows: • Q1: Querying about classes and subclass relations • Q2: Querying about properties of individuals • Q3: Comparing individuals between classes • Q4: Searching for a path with some specified relations between two classes The first two types of queries focus on the taxonomical hierarchy described by the KB and the last two on the relationships between individuals of one or more classes. Each of these query types can have numerous question templates. For example, for the first query type some example question templates are: What are the subclasses of X? Is it true that class X is a subclass of class Y? Is it true that X and Y are disjoint? etc. Defining numerous question templates for each type of query allows us to capture a large space of queries that the users are interested in asking of the KB Bio 101. For example, • is it true that a cell with a nucleus is a prokaryotic cell? • what are the types of exergonic reactions? • what are organelle parts of a cell? • describe the differences and similarities between mitochondria and chloroplasts; http://www.ai.sri.com/ ̃halo/public/ exported-kb/biokb.html • what process provides raw materials for the citric acic cycle during cellular respiration? • in the absence of oxygen, yeast cells can obtain energy by which process? The detailed input and a possible way for computing the output for each of these queries are given in (Chaudhri et al. 2013b). The current reasoning on KB Bio 101 is done using KM and implements special algorithms for answering queries of the types Q1-Q4. In some queries, only approximated answers are provided. The current reasoning engine also employs a heuristic called unification mapping (UMAP) to unify terms representing objects (Chaudhri and Son 2012). Challenges for AI-Reasoners Reasoning in KB Bio 101 poses a challenge for state-ofthe-art AI-reasoning engines for the following reasons: • The KB Bio 101 contains rules with function symbols for which the grounding is infinite. A simple example is a KB consisting of a single class person, and a single relation has-parent, and a statement of the form “for each person there exists an instance of the has-parent relation between this person with another individual who is also a person”. The skolemized versions of these statements require function symbols. An obvious first challenge that must be addressed is to develop suitable grounding techniques. • The rules in KB Bio 101 can define the necessary and sufficient properties of a class that are structured as general graphs as opposed to trees. Furthermore, the class definitions can be circular in that they can refer to each other. Use of graph structures in class descriptions frequently causes undecidability in description logic systems (Motik et al. 2009). Therefore, the computation for queries Q1 and Q2 is likely to be intractable. • Even though rules in KB Bio 101 follow a small number of axiom templates, the size of this KB indicates that this could be a non-trivial task for state of the art reasoners. • KB Bio 101 contains more than 100,000 non-ground rules specifying equality between individual terms. This is because KB Bio 101 is a fully-specified knowledge base in the sense discussed in (Chaudhri and Son 2012). Computing these rules in each export is a time consuming and complex process. Furthermore, the approach is also not elaboration tolerant in the sense that these equality relations need to be maintained as the knowledge base is updated. A better approach in dealing with underspecification is to use unification mapping (UMAP), as proposed in (Chaudhri and Son 2012). Let us denote with KB Bio 101 the KB obtained from KB Bio 101 by removing the rules for specifying the equality relation. The rules developed for UMAP aim at enforcing the following principles: (P1) Specificity principle: in selecting terms for the construction of umap-atoms, more specific terms should be preferred over less specific ones. (P2) Specialization Principle: Given a relation s and a class c, the application of the specificity principle should be limited to at most one possible value of s at c. Furthermore, if the application of (P1) does not violate (P2) then (P1) should be applied. (P3) Redundancy Principle: In the presence of multiple specifications of a relation for an individual, the mostspecific relation specification overrides less-specific ones. (P4) Consistency Principle: If a unification between x and y takes place at class c then it should be applied in every slot of class c. Answering queries of the form Q1−Q4 in KB Bio 101 would require reasoning with rules for UMAP which is a computationally intensive task. The reason lies in that the UMAP needs to consider the combinatorics of equating different individuals across the class hierarchy. • The reasoning tasks of computing differences between two concepts and finding relationships between two individuals are computationally intensive tasks. Previous implementations of these tasks rely on graph algorithms and trade completeness for efficiency. These tasks will present a tough challenges to any reasoner.
منابع مشابه
Building Large Knowledge Bases in Molecular Biology
Large scale genome sequencing projects are now producing hugh amounts of data which can be readily stored and managed within data base management systems, and analyzed using dedicated software packages. The results of these analyzes should also be stored with the input DNA sequences. The increasing complexity and size of the objects to be described and managed have led biologists to rely on adv...
متن کاملPosition paper for W3C query language workshop
z Text retrieval (e.g. PAT) identify documents in large document bases z Addressing documents (e.g. Xlink) have flexible link models z Document layout (e.g. XSL) transform documents for layout z Document transformation (e.g. tree regular grammars) transform documents from one DTD to the other z Databases (e.g. OQL) map documents into data structures to apply DB techniques z Knowledge bases (e.g...
متن کاملD2R2: Disk-Oriented Deductive Reasoning in a RISC-Style RDF Engine
Deductive reasoning lies in the expressive intersection of Datalog and Description Logics. In this paper, we present the D2R2 engine, which implements deductive reasoning capabilities based on the Query-Sub-Query (QSQR) algorithm on top of the disk-oriented RDF3X engine. D2R2 aims to bridge the gap between rule-oriented (intensional) reasoning with deduction rules and data-oriented (extensional...
متن کاملProblem-Oriented Corporate Knowledge Base Models on the Case-Based Reasoning Approach Basis
One of the urgent directions of efficiency enhancement of production processes and enterprises activities management is creation and use of corporate knowledge bases. The article suggests a concept of problem-oriented corporate knowledge bases (PO CKB), in which knowledge is arranged around possible problem situations and represents a tool for making and implementing decisions in such situation...
متن کاملStudy on Knowledge -based Intelligent Fault Diagnosis of Hydraulic System
A general framework of hydraulic fault diagnosis system was studied. It consisted of equipment knowledge bases, real-time databases, fusion reasoning module, knowledge acquisition module and so on. A tree-structure model of fault knowledge was established. Fault nodes knowledge was encapsulated by object-oriented technique. Complete knowledge bases were made including fault bases and diagnosis ...
متن کاملSTROBE: Support for Structured Object Knowledge Representation
STROBE is a system that provides object-oriented programming support tools for INTERLISP. It offers a primitive foundation with which more complex structured object representation schemes can be constructed. STROBE implements multiple resident knowledge bases, tangled generalization hierarchies, flexible inheritance of properties, procedural attachment, and event-sensitive procedure invocation.
متن کامل